CS224N Natural Language Processing with Deep Learning Assignment 3

课程主页：https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/

视频地址：https://www.bilibili.com/video/av46216519?from=search&seid=13229282510647565239

这里回顾CS224N Assignment 3的内容，参考资料如下：

https://github.com/ZacBi/CS224n-2019-solutions

1.Machine Learning & Neural Networks

(a)

(i)

该更新方式实际上计算了梯度的加权和，所以不会变化太大；低方差可以减少震荡的情形。

(ii)

梯度小的地方会得到更大的更新，梯度大的地方会得到更小的更新，该方法使得各个方向的更新幅度比较接近，可以减少震荡的情形。

(b)

(i)

$\begin{aligned} \mathbb{E}_{p_{\text {drop }}}\left[\mathbf{h}_{\text {drop}}\right]_{i} &=\gamma (1- p_{\text{drop}}) h_i \\ &=h_{i} \end{aligned}$

所以

$\gamma = \frac 1 {1- p_{\text{drop}}}$

(ii)

训练时使用dropout是为了探索更多网络结构，增加模型的泛化性；评估时需要一个准确的结果，所以不使用dropout。

2. Neural Transition-Based Dependency Parsing

(a)

Stack	Buffer	New dependency	Transition
[ROOT]	[I, parsed, this, sentence, correctly]		Initial Configuration
[ROOT, I]	[parsed, this, sentence, correctly]		SHIFT
[ROOT, I, parsed]	[this, sentence, correctly]		SHIFT
[ROOT, parsed]	[this, sentence, correctly]	parsed$\to$I	LEFT-ARC
[ROOT, parsed,this]	[sentence, correctly]		SHIFT
[ROOT, parsed,this,sentence]	[correctly]		SHIFT
[ROOT, parsed,sentence]	[correctly]	sentence$\to$this	LEFT-ARC
[ROOT, parsed]	[correctly]	parsed$\to$sentence	RIGHT-ARC
[ROOT, parsed,correctly]	[]		SHIFT
[ROOT, parsed]	[]	parsed$\to$correctly	RIGHT-ARC
[ROOT]	[]	ROOT$\to$parsed	RIGHT-ARC

(b)

$O(n)$，因为一共要Shift $n$次，然后生成ARC也要$n$次。

(c)

init

### YOUR CODE HERE (3 Lines)
### Your code should initialize the following fields:
###     self.stack: The current stack represented as a list with the top of the stack as the
###                 last element of the list.
###     self.buffer: The current buffer represented as a list with the first item on the
###                  buffer as the first item of the list
###     self.dependencies: The list of dependencies produced so far. Represented as a list of
###             tuples where each tuple is of the form (head, dependent).
###             Order for this list doesn't matter.
###
### Note: The root token should be represented with the string "ROOT"
###
self.stack = ["ROOT"]
self.buffer = copy.deepcopy(sentence)
self.dependencies = []


### END YOUR CODE

parse step

### YOUR CODE HERE (~7-10 Lines)
### TODO:
###     Implement a single parsing step, i.e. the logic for the following as
###     described in the pdf handout:
###         1. Shift
###         2. Left Arc
###         3. Right Arc
if transition == "S":
word = self.buffer.pop(0)
self.stack.append(word)
elif transition == "LA":
self.dependencies.append((self.stack[-1], self.stack[-2]))
self.stack.pop(-2)
else:
self.dependencies.append((self.stack[-2], self.stack[-1]))
self.stack.pop(-1)

### END YOUR CODE

(d)

minibatch parse

### YOUR CODE HERE (~8-10 Lines)
### TODO:
###     Implement the minibatch parse algorithm as described in the pdf handout
###
###     Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
###                 unfinished_parses = partial_parses[:].
###             Here `unfinished_parses` is a shallow copy of `partial_parses`.
###             In Python, a shallow copied list like `unfinished_parses` does not contain new instances
###             of the object stored in `partial_parses`. Rather both lists refer to the same objects.
###             In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
###             contains references to the same objects. Thus, you should NOT use the `del` operator
###             to remove objects from the `unfinished_parses` list. This will free the underlying memory that
###             is being accessed by `partial_parses` and may cause your code to crash.
partial_parses = [PartialParse(sentence) for sentence in sentences]
unfinished_parses = partial_parses[:]
n = len(unfinished_parses)

while (n > 0):
    l = min(n, batch_size)
    transitions = model.predict(unfinished_parses[:l])
    for parse, trans in zip(unfinished_parses[:l], transitions):
        parse.parse_step(trans)
        if len(parse.stack) == 1:
            unfinished_parses.remove(parse)
            n -= 1
dependencies = [partial_parses.dependencies for partial_parses in partial_parses]

### END YOUR CODE

(e)

init

self.embed_to_hidden = nn.Linear(self.n_features * self.embed_size, self.hidden_size)
nn.init.xavier_uniform_(self.embed_to_hidden.weight)
self.dropout = nn.Dropout(self.dropout_prob)
self.hidden_to_logits = nn.Linear(self.hidden_size, self.n_classes)
nn.init.xavier_uniform_(self.hidden_to_logits.weight)

embedding_lookup

x = self.pretrained_embeddings(t)
x = x.view(x.size()[0], -1)

forward

embeddings = self.embedding_lookup(t)
hidden = self.embed_to_hidden(embeddings)
hidden = nn.ReLU()(hidden)
hidden = self.dropout(hidden)
logits = self.hidden_to_logits(hidden)

train

optimizer = optim.Adam(parser.model.parameters(), lr=lr)
loss_func = nn.CrossEntropyLoss()

train_for_epoch

logits = parser.model.forward(train_x)
loss = loss_func(logits, train_y)
loss.backward() 
optimizer.step()

计算结果如下

dev UAS: 88.38

test UAS: 88.90

(f)

略过